Project Description

Data Description: You are provided with a dataset of images of plant seedlings at various stages of grown. Each image has a filename that is its unique id. The dataset comprises 12 plant species. The goal of the project is to create a classifier capable of determining a plant's species from a photo.

1. Import the libraries, load dataset, print shape of data, visualize the images in dataset.

Import Libraries

Load dataset

Insights: There are 4750 color photos that are 128x128

Visualize the images in dataset.

Insights: The data set is imbalanced

2. Data Pre-processing:

2a. Normalization.

Normalize the data in preparation of modeling. Using 255 to normalize since the intensities can be from 0 to 255

2b. Gaussian Blurring.

Using 5x5 Gaussing Blurring to help smooth the images which will make it easier for the model to classify images due to the fact the edges won't have as much noise

2c. Visualize data after pre-processing.

3. Make data compatible:

3a. Convert labels to one-hot-vectors.

Converting the labels to 1 hot coding since we will use CNN we will need 12 neurons one for each class

3b. Print the label for y[0].

3c. Split the dataset into training, testing, and validation set

d. Check the shape of data, Reshape data into shapes compatible with Keras models if it’s not already. If it’s already in the compatible shape, then comment in the notebook that it’s already in compatible shape.

Insights: Data is in correct shape since i am using color. 128x128 with 3 channels. The y has 12 variables for the 12 different classes. Everything is ready for the model

4. Building CNN:

4 a. Define layers.

4b. Set optimizer and loss function. (Use Adam optimizer and categorical crossentropy.)

-define a sequential model

5. Fit and evaluate model and print confusion matrix

Insights:

The model performed well considering the small data. There is a slight over fitting but nothing major

Insights: On the test set, we have 80% accuracy which is roughly the same as the training and test data set. We could training the model more and use dropout layers and more data argumentation to help improve accuracy

Insights: The model had a hard time telling the difference between Black-grass and Loose Silky-bent, but performed well on the other 10 plants

6. Visualize predictions for x_test[2], x_test[3], x_test[33], x_test[36], x_test[59]. (5 Marks)